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The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for fi- 
nite temperature quantum systems with arbitrary geometry. In this paper, we detail numerical algorithms for 
performing the required steps of the MED, principally solving a minimization problem with a preconditioned 
Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate 
the power of the method with the spin- 1/2 XXZ model on the 2D square lattice, including the extraction of 
critical points and details of each phase. Although the method shares some qualitative similarities with exact- 
diagonalization, we show the MED is both more accurate and significantly more flexible. 
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I. INTRODUCTION 

Although the equations governing quantum many-body 
systems are simple to write down, finding solutions for the 
majority of systems remains incredibly difficult. Modern 
physics finds itself in need of new tools to compute the emer- 
gent behavior of large, many-body systems. 

There has been a great variety of tools developed to tackle 
many-body problems, but in general, large 2D and 3D quan- 
tum systems remain hard to deal with. Most systems are 
thought to be non-integrable, so exact analytic solutions are 
not usually expected. Direct numerical diagonalization can be 
performed for relatively small systems — however the emer- 
gent behavior of a system in the thermodynamic limit may be 
difficult to extract, especially in systems with large correlation 
lengths. Monte Carlo approaches are technically exact (up to 
sampling error), but suffer from the so-called sign problem 
for fermionic, frustrated, or dynamical problems. Thus we are 
limited to search for clever approximations to solve the ma- 
jority of many-body problems. 

Over the past century, hundreds of such approximations 
have been proposed, and we will mention just a few notable 
examples applicable to quantum lattice models. Mean-field 
theory is simple and frequently arrives at the correct quali- 
tative description, but often fails when correlations are im- 
portant. Density-matrix renormalisation group (DMRG) ^ 
is efficient and extremely accurate at solving ID problems, 
but the computational cost grows exponentially with system 
size in two- or higher-dimensions ^ [st]. Related tensor- 
network techniques designed for 2D systems are still in their 
infancy 10-01 ■ Series-expansion methods 101 can be success- 
ful, but may diverge or otherwise converge slowly, obscuring 
the state in certain regimes. There exist a variety of cluster- 
based techniques, such as dynamical-mean-field theory 
and density-matrix embedding 131. 

Here we discuss the so-called Markov entropy decompo- 
sition (MED), recently proposed by Poulin & Hastings lUOll 
(and analogous to a slightly earlier classical algorithm uVA ). 
This is a self-consistent cluster method for finite temperature 
systems that takes advantage of an approximation of the (von 
Neumann) entropy. In [10], it was shown that the entropy 
per site can be rigorously upper bounded using only local in- 
formation — a local, reduced density matrix on N sites, say. 



This approximation becomes exact in the case of a ID quan- 
tum (or classical) Markov chain ifioll . and leads to an expo- 
nential reduction of cost for exact entropy calculations when 
the global density matrix is a higher-dimensional Markov net- 
work state iHQj. 

The second approximation used in the MED approach is 
related to the A^-representibility problem. Given a set of lo- 
cal but overlapping reduced density matrices {pi}, it is a very 
challenging problem to determine if there exists a global den- 
sity operator which is positive semi-definite and whose partial 
trace agrees with each pi. This problem is QMA-hard (the 
quantum analogue of NP) ifTil [Tsll . and is hopelessly diffi- 
cult to enforce. Thus, the second approximation employed 
involves ignoring global consistency with a positive opera- 
tor, while requiring local consistency on any overlapping re- 
gions between the pi. At the zero-temperature limit, the MED 
approach becomes analogous to the variational nth-order re- 
duced density matrix approach, where positivity is enforced 
on all reduced density matrices of size n lllwisll . 

The MED approach is an extremely flexible cluster method, 
applicable to both translationally invariant systems of any di- 
mension in the thermodynamic limit, as well as finite systems 
or systems without translational invariance (e.g. disordered 
lattices, or harmonically trapped atoms in optical lattices). 
The free energy given by MED is guaranteed to lower bound 
the true free energy, which in turn lower-bounds the ground 
state energy — thus providing a natural complement to varia- 
tional approaches which upper-bound the ground state energy. 
The ability to provide a rigorous ground-state energy window 
is a powerful validation tool, creating a very compelling rea- 
son to use this approach. 

In this paper we paper we present a pedagogical introduc- 
tion to MED, including numerical implementation issues and 
applications to 2D quantum lattice models in the thermody- 
namic limit. In Sec. HIl we give a brief derivation of the 
Markov entropy decomposition. Section |III] outlines a robust 
numerical strategy for optimizing the clusters that make up 
the decomposition. In Sec. |IV] we show how we can extend 
these algorithms to extract non-trivial information, such as 
specific heat and susceptibilities. We present an application of 
the method to the spin- 1/2 XXZ model on a 2D square lattice 
in Sec.[V] describing how to characterize the phase diagram 
and determine critical points, before concluding in Sec. [Vl] 



n. MARKOV ENTROPY DECOMPOSITION 

In this section we will briefly review the strategy behind 
MED, as previously detailed in piOf| . We begin in the setting of 
a quantum lattice model with a Hamiltonian H that is 'local' 
according to some graph (e.g. a regular 2D square lattice). 
For instance, if H contains only two-body terms, then we can 
decompose 
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where <ij> denotes neighboring sites. The goal is to iden- 
tify the equilibrium properties of the system at temperature T, 
such as calculating local expectation values (e.g. energy) or 
determining if the state is in a symmetry-breaking phase. 

The global density matrix pQ describing the equilibrium 
state at temperature T is the one that minimizes the free en- 
ergy. 



F = E -TS = Ti HpG + kBTpahipG , 



(2) 



where fcs is Boltzmann's constant and Tr [x] is the trace of x. 
The well-known solution to this is given by the Gibbs ensem- 
ble. 



PG = cM-PH)/Ti exp(-/3F) 



(3) 



where /? ~ l/ksT. However, direct evaluation of Eq. (O is 
prohibitively expensive for large systems, primarily because 
the dimension of pG and H is simply too large. 

In this approach we will store in memory a set of small 
density matrices {pi}, each corresponding to the reduced state 
of the system on a set of lattice points (or cluster) Ci. For a 
consistent description of the state, we must impose that where 
two clusters Ca and Cb partially overlap, the corresponding 
reduced density matrices on their overlapping region Ca H Cb 
must agree. We refer to this condition as local consistency. 

Given this description of the global state, it is possible to 
calculate the expectation value of local quantities, such as 
the individual terms (hij) and thus the total energy E = 
(H). On the other hand, calculating the system entropy 
S = — /cfiTr [pg In pc] requires knowledge of the global den- 
sity matrix. 

Nevertheless, we can obtain an approximate value of the 
system entropy by considering only small segments of the sys- 
tem. We begin by ordering the lattice points according to an 
integer index k. In ID, the natural ordering is trivial; for 2D 
systems we use the "typewriter" convention of beginning in 
the top-left, then ordering left-to-right first followed by top- 
to-bottom (see Fig.[T](a)), and this is simple to generalize to 
higher dimensions. Note that the choice of ordering is arbi- 
trary — and each choice will define a unique MED simulation, 
with possibly different computational cost and accuracy. 

Using this ordering, we can decompose the total sys- 
tem entropy into a sum of conditional entropies. Denoting 
S{si, S2, . . . ) as the von Neumann entropy of the reduced 
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FIG. 1 : A two-dimension system of 25 spins on a square lattice is de- 
picted, and indexed in a left-to-right then top-to-bottom "typewriter" 
ordering, (a) The quantitity 5(1811, ...17) is the difference between 
the entropies in the region bounded by the blue, solid line, and the re- 
gion bounded by the green, dashed line, (b) We can approximate this 
difference using the smaller regions A4ig and A4ig \ 19, depicted 
here, resulting in an upper-bound of the conditional entropy as-per 
Eq. mi. 



state on sites si,S2, 
S{1,...,N) = 



. . . , we have 

Sil) + S{1, 2)^8(1) + ... 

+ S{l,...,k)~ Sil,...,k-1) + 

+ S'(l,...,iV)-S'(l,...,iV-l) 

N 

Y,s{k\i,...,k-i), 

k=l 



(4) 



where S{A\B) is the conditional entropy of A given B, equal 
to S{A,B) — S{B). In a typical thermal state, one would 
expect that the site k would only be strongly correlated with 
sites that are close to it (according to the interaction graph, 
say). We consider the set of sites (cluster) Mk that consists 
(a) only of sites with index less than or equal to fc, and (b) are 
deemed to be in the neighborhood JVk of site k (including k 
itself), thus Mk = A/fe n {1, fc}. We also define A^J, = 
Aik\ k, the cluster without k itself. In all that follows, the 
primed symbols will relate specificaUy to the reduced clusters 

Strong sub-additivity of the von Neumann entropy H 
gives the inequality 



then 



S{k\l,...,k-1) < S{k\M'k). 



(5) 



As the cluster Aik grows to include more of the sites that k is 
correlated with, the approximation becomes more accurate. In 
fact, in the special case that the system is a quantum Markov 
network, we can specify a minimal cluster Mk such that the 
entropy decomposition is exact (typically, a thermal state is 
not a quantum Markov network unless the Hamiltonian con- 
sists of commuting terms lfl3ll ). Further, the result can also be 
seen to be exact 111 Oil when the entropy of the cluster scales 
simply as S" = a X volume + /? x boundary + 7 (where a is a 
volumetric term coming from thermal fluctuations, the /3 term 
originates from short-range correlations or entanglement, and 
the 7 term is a topological contribution). Such a scaling is 
the expected fixed point of the renormalization flow for non- 
critical systems, and thus after sufficient course-graining (or 
equivalently, growth in the cluster size) the approximation will 
be quite accurate. 
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Equation Q implies that the total system entropy is over- 
estimated 



(6) 



and therefore, we can wnt/erestimate the free energy of the 
total system 



F>F^E-TS^E- Tj2s{k\M'k). 



(7) 



using only local, reduced density matrices on the clusters Aik- 
These relations could be applied in a variety of settings, from 
estimating the entropy and free energy in an experimental sys- 
tem, to building a numerical technique as we discuss here. 

We can attempt to turn the above into a numerical method 
by minimizing the (approximate) free energy F over the set 

of globally consistent density matrices. We say the set of 
density matrices is globally consistent {{pk} G ^) if and only 
if there exists a positive, global density operator which agrees 
with each reduced density matrix after the appropriate partial 
trace. 

Because determining global consistency is extremely diffi- 
cult, we must make one further approximation in order to pro- 
ceed. We instead allow any set of operators {pk} belonging 
to the enlarged family D of locally consistent operators, 
which agree on their overlapping sections; 



Tr Pi = Tr pj 



(8) 



The MED algorithm now aims to determine the set of den- 
sity operators {pk} G ^ that minimizes F. Because we have 
only enlarged our search space (fi D ^Y), this solution also 
gives a rigorous lower bound to the true free energy. 



mini*" > min_F > min_F. 
o o o 



(9) 



Further, because the ground state energy Eq > F for all tem- 
peratures, we simultaneously have a lower bound for Eq. 

The full definition of the minimization problem is to mini- 
mize the function (where p'^. = Trfe[pfc]) 

F = ^Tt ^Hpk + ksTpk log Pk - Tr [ksTpk log Pk] , 

k 

(10) 

over the set of density matrices that are Hermitian, positive 
semi-definite {pk h 0) with unit trace, and are locally consis- 
tent (i.e. obey Eq. ([8]l). At T = this reduces to a much eas- 
ier semi-definite program (for examples, see Refs. lfl"6l - [l8ll ). 
Note that F may not be monotonic with T, and better bounds 
on Eq may be found with the MED at finite temperature, as 
can be clearly seen in Fig.|5] 

Note that this approach can easily be applied to infinite, 
translationally invariant systems. If the system is translation- 
ally invariant, the reduced density operator will appear the 
same for all k (note that even in a symmetry-breaking phase, 
the thermal state is a mixture of all sectors). Thus each term 
in Eq. ( fTOl i is equal, and it suffices to minimize over just one. 



locally consistent density matrix. On the other hand, for non- 
translationally invariant systems containing N sites (or a unit- 
cell of TV sites), a complete set of N clusters should be used. 
Care should be taken at the boundaries of a finite, translation- 
ally invariant system (e.g. the first cluster always contains just 
one site, even for periodic systems). 

Before we continue to the specifics of numerical optimiza- 
tion in Sec. |III1 we wish to point out one of the general 
properties of MED: the total minimization problem is convex 
(Eq. ( fTOl i is convex, the set of unit-trace, positive density oper- 
ators is convex, while the local consistency conditions induce 
an linear subspace that the search is constrained within). This 
implies that there is just one unique minima to the problem, 
and as the objective function [Eq. ( fTOt I is smooth, robust and 
reliable numerical techniques allow us to determine the min- 
ima to great precision. This is particularly important when 
claiming a lower-bound to the free energy — a claim which 
relies on an accurate minimization. 



III. OPTIMIZATION 

In this section we will discuss numerical strategies for min- 
imizing Eq. ( fTOl ). To solve this constrained optimization prob- 
lem, we use a combination of interior-point potentials and 
subspace projection, as discussed in Sec. IIII Al We will use 
the first- and second-derivatives to minimize Eq. ( fTOl i; numer- 
ically efficient strategies for calculating these are presented 
in Sec. IIII B I In Sec. IIII CI we present reliable convex opti- 
mization algorithms based on nonlinear conjugate-gradient or 
Newton's method. 



A. Constraints 

The minimization problem is constrained in two ways. 
Firstly, each pi must be (semi-) positive-definite. Secondly, 
the density matrices must be locally consistent, and have unit 
trace. The latter constraints limit the search space to an affine 
subspace, discussed further below. To ensure we satisfy the 
former, we include an interior-point (IP) potential ||20|] . pro- 
portional to the control variable p, into the cost function: 

F = ^ Tr (^Hpk + ksTpk log pi + pL log pk 

k 

-Ti-{kBTp'^logp',). (11) 

The additional term creates a logarithmic potential barrier sur- 
rounding the surface of singular density matrices. The loga- 
rithmic form is by far the most commonly used IP potential, 
being sufficiently strong to ensure stability while decaying ex- 
tremely quickly away from this surface. In this case, it also 
preserves the convexity of the cost function. By carefully 
choosing p, and reducing it appropriately, we can converge 
to the (unique) minimum within the space of positive density 
matrices, while removing instability issues in nearly-singular 
regions. 
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The remaining constraints impose that local consistency 
(Eq. ([8]l) and the correct normalization, 

= 1. (12) 

Equation ([8]l can be written more conveniently in the form 



Tr /5, 

M^\Mj 



Tr p, = 



(13) 



Equations (112113b are first-degree polynomials in the search- 
space variables (i.e. the elements of pi), and define an affine 
subspace. It is well-known that imposing affine constraints 
onto a convex optimization problem preserves convexity ll20ll . 

The space of differences between two normalized, locally- 
consistent states is linear. Thus, we can satisfy Eqs. (112113b 
exactly by beginning with a locally-consistent state (e.g. the 
fully mixed state), and projecting all updates Api to the vari- 
ables onto this linear subspace. The remainder of this subsec- 
tion is devoted to how to construct the appropriate orthogonal 
projector correctly and efficiently. 

Formally, we wish to construct a projector V such that 
Vlxi, xn} returns the closest (according to the L2-norm) 
set of numbers {yi, yjv} satisfying 



Tr Ui ^ 0, 



Tr iji — Tr 2/?" 

M,\Mj Mj\Mi 



0. 



(14) 



(15) 



One method of performing this would be to simply construct 
the (sparse) super-operator V in memory and perform the pro- 
jection as a matrix-vector multiplication. However, the di- 
mension of the super-operator is too large for this method to 
be efficient. Here we discuss how to perform this projection 
in a step-wise fashion. 

Firstly, the traceless constraint Eq. ( fT4] i is orthogonal to the 
remaining constraints and thus can be enforced independently 
(simply by calculating the complete trace, and subtracting the 
appropriate identity operator). As different terms in Eq. (fTST i 
are generally not orthogonal constraints, there are some sub- 
tleties in the projection algorithm. 

To demonstrate this subtlety, we can try to enforce a sin- 
gle constraint of the form Eq. ( fTsT i, for instance the two-body 
term in Fig. |2](a). Here we are working in the framework of 
translationally invariant systems, but the same principles hold 
for generic systems. Given an operator pi23 (on sites labeled 
1,2,3), we wish to impose pi2 = P23 (taking the appropriate 
partial traces). As a first-guess, we can try enforcing that these 
two operators become their average, {pi2 + p23)/2: 



P123 — P123 



-pi2 + P23 



I3 
d 



li 
d 



P12 — P23 



(16) 



However, the problem with this is that p'12 7^ P23 unless it 
is already true that Pi = P2 = Ps (see Fig. [13] (b)). So, 
even though this one-site constraint is actually implied by the 
two-site constraint, we must enforce the one-site constraint 



(a) 



(b) 



FIG. 2: A cluster of three sites on a translationally invariant chain 
must obey certain local consistency conditions, (a) The two-site con- 
dition pi2 = P2S arises from translating the cluster by just one site, 
(b) The one-site conditions pi = p2 = pa can be seen from two 
separate translations of the cluster. Note that although (b) is a direct 
consequence of (a), both must be taken into account when construct- 
ing the action of the projector V. 



manually first. Observe that the algorithm: 



P123 



P123 



-2pi + /32 + P3 I2 



d2 



ll P1-2P2+P3 I3 

d 3 d 

I12 Pi + P2 - 2p3 

^® 3 ' 



P123 



P123 



-p'i2 + p'23 ^ is 



ll ^ P'l2 - P'23 



(17) 



p'k- 



performs an orthogonal projection that enforces p'/j 

The projection algorithm proceeds as follows. First we 
identify any one-site constraints, which are enforced. Then we 
enforce the two-site constraints, followed by three-site con- 
straints, and so forth. The problem now falls to identifying all 
the n-site constraints of the problem, and carefully discarding 
any that are unnecessary. 

Take for instance a contiguous cluster of iV-sites on a ID, 
translationally-invariant chain, as in Fig.[3](a). Then the min- 
imal set of sub-constraints to project into the translational in- 
variant subspace consists of all the contiguous sub-clusters of 
size 1, — 1 (see Fig. [3] (b)). Any constraints on non- 
contiguous sub-clusters are implied by (but not required by) 
the smallest enveloping sub-cluster. The projection V{p] is 
performed by a series of A^ — 1 steps, similar to Eq. ( [TT] ). 

The solution in higher-dimensions may be more compli- 
cated. For a regular square lattice, translational-invariance 
on a rectangular cluster requires consecutively imposing the 
appropriate constraint on all the rectangular sub-clusters, in 
order of size (see Fig. [3] (c,d)). In general, each element of 
the translation group may generate a sub-cluster constraint; 
by taking the original cluster and translated copy, the overlap- 
ping region coiTesponds to a constraint. A relevant example is 
depicted in Fig.|4](a,b). The projection V requires six distinct 
steps involving the sub-clusters in Fig.|4](b) 

For Hamiltonians with a larger symmetry group (such as ro- 
tational or reflection symmetry), it becomes possible to inves- 
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FIG. 3: Example clusters and their relevant sub-cluster consistency 
conditions, (a) A cluster of 5 sites on the translationally-invariant, 
infinite chain, (b) These four sub-clusters are required to enforce 
local consistency, construct the projector V. The number of unique 
translations where the sub-cluster fits entirely in the original cluster 
(a) is marked beside each, (c) A 3 x 3 cluster on the square lattice, 
(d) In this case, all rectangular sub-clusters are required to construct 
the projector V. 
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FIG. 4: A more complicated cluster and its relevant sub-cluster con- 
sistency conditions, (a) An cluster of 7 sites on the square lattice. 
This cluster A4k may be used to estimate the entropy of the high- 
lighted site k conditioned on sites with index < k, i.e. S{k\{< k}). 

(b) Six sub-clusters are required to enforce translational invariance. 

(c) Sub-clusters for translational, rotational and reflection symme- 
tries. Sub-clusters related to each other by rotation and/or reflection 
should be dealt with simultaneously. Some sub-clusters have inter- 
nal rotational or reflect symmetries — the total number of internal 
symmetries is labeled by IS. 



FIG. 5: MED predictions of the free energy for the antiferromag- 
netic, spin-I/2 XY model (Hamiltonian Eq. J38b with A = 0, J = 1) 
on the square lattice, using the 7-site cluster depicted in Fig. |4] The 
solid, red line corresponds to results when enforcing all point sym- 
metries of the square lattice, while the dashed, blue line comes from 
simulations enforcing only translational invariance. The larger sym- 
metry group results in a strictly greater free energy, and a higher 
bound on the ground state energy (Eq > -2.26912 or -2.27068, 
respectively). 

tigate all sub-clusters generated by the total symmetry group 
(and the closure of this originating from overlaps between 
sub-clusters — see Fig.|4](c)). This will lead to a larger set of 
constraints to impose, but improves the accuracy of the MED 
approximation because the set of allowed density matrices fl' 
is reduced (see the comparison in Fig. |5]). Occasionally, this 
procedure generates a sub-cluster constraint that is a direct 
descendent of another, and is therefore unnecessary to apply 
(detecting these cases is simple by direct inspection). 

The construction of the projector V is just one major ele- 
ment of our optimization procedure. In the next section we 
use V to calculate derivatives of F' within the constrained 
subspace. 

B. Calculating the Derivatives 

The optimization will proceed by treating the elements of 
the density matrix as a list of variables, which we vary to find 
the (unique) free energy minimum. The first- and second- 
derivatives of the free energy with respect to these variables 
can be very helpful in determining how to vary these vari- 
ables to approach the minimum. The results of matrix calcu- 
lus ll2ll I22II will allow us to find the derivatives of F w.r.t. 
Pi- 

The first derivative is relatively straightforward to compute. 

^=H + ksT (^og P^-^® log + (18) 

Recall that the primed variables p'j, correspond to the reduced 
density matrix on the cluster excluding k itself, i.e. /5'j, = 
Ttfe pfe. 

However, this does not take into account that the variables 
are constrained to the locally-consistent subspace. The pro- 
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jected derivative 



dF 
dpN 



(19) 



lives entirely in the correct subspace, returning the same inner- 
product on locally-consistent displacements. The projected 
derivative is efficient to compute, typically by directly diag- 
onalizing pk and p'f. to calculate the matrix logarithm. The 
total cost scales as 0{Nd^"-), where N is the number of dis- 
tinct clusters (just one, for translationally invariant systems in 
the thermodynamic limit), d is the Hilbert space dimension of 
each lattice site, and n is the number of sites per cluster 

Using the projected derivative, we can minimize F using 
a projected-subspace, derivative-based method, such as the 
steepest-descent, nonlinear conjugate-gradient or L-BFGS al- 
gorithms. Conjugate gradient was used in Ref. ifioll to mini- 
mize the free energy, however as it was observed there, accu- 
rate minimization is challenging, while convergence is quite 
slow. We explain that this behaviour is due to stiffness, below. 

Using the results of matrix calculus (see the Appendix) 
we obtain the second-derivative, or Hessian super-operator, 
which has the form 



d^F 

dpidpj 



= 0, 



(20) 



when i ^ j, and 



d'^F 



U} ®UT ■ diag ]Vh ■ U^ ® Ui 



(21) 



- cy • C/;t ^ u'^i . diag Ml ■U^'^U^ ■ pt, 



where pi = U-DilJi and p^ ~ U'^ D'Jj[ are the eigenvalue 
decompositions of pi and p'^, while diag(i) is the diagonal 
super-operator that multiplies element-wise by the entries of 
X. The diagonals of matrix Mi is defined as 



(MO,, -fcBT(A)7/-M(A)-', 



'33 



(22) 



and the off-diagonals 



fcBr(ln(A),, - In(A)fcfc) - - mkl) 



(23) 



M'l is defined similarly based on the values of D[: 

{M'^,,^kBT(b[)Tl, (24) 



kk 



(25) 



kk 



Finally, the partial-trace super-operator pt performs the partial 
trace over site k, for example, p'f, = Tr^ pk = pt{/5fc}- The 
quasi-inverse of this operation, the expansion super-operator 
ey, adds the completely mixed site at position fc, so ey {/S^.} = 
p',, ® i/d. 



The Hessian on the locally-consistent subspace can be 
found by simply applying the projection super-operator be- 
fore and after the second-derivative. 



V^F = V ■ diag • 



d^F 



(26) 



N 



In this equation, diag is the expansion into a matrix over the 
set of clusters i, j (representing the diagonal structure given 
by Eq. (|20|). 

Overall, the action of the Hessian is efficient to compute, 
requiring time scaling as 0{Nd^"), similar to the derivative. 
Thus, it is feasible to use Hessian-based optimization strate- 
gies, particularly given that the diagonalization of pk and pj^ 
may be the most expensive operations. In the next section we 
discuss a preconditioned Newton's algorithm which we have 
implemented here. 

Finally, we wish to discuss the issue of stiffness in this 
minimization problem. By inspection of the unprojected Hes- 
sian, Eqs. ( I22H25] ). we can determine a simple scaling of the 
stiffness as a function of pk- To begin with, the first term 
in Eq. ( |22] | dominates (when p = 0); it is positive-definite 
(as entropy is convex), while the second term is negative- 
definite (for the same reason), and the sum is known to be 
positive semi-definite (because F is convex). Thus, the sub- 
tracted term will typically increase the stiffness of the Hes- 
sian (this can be proven in the case that pk and p'^. can be co- 
diagonalized). The first term is simply the matrix derivative 
of In Pk, which very roughly speaking scales like p^^. There- 
fore, we expect the stiffness of the problem to be related to the 
ratio of eigenvalues in pk, which for thermal states increases 
exponentially with inverse temperature /3. It is in-fact the very 
small eigenvalues of pk that lead to the minimization difficul- 
ties seen in Ref. jTojl . Fortunately, the interior-point potential 
proportional to p ameliorates this problem signficantly by ar- 
tificially increasing the small eigenvalues of pk- At low tem- 
peratures we expect that these small eigenvalues, and thus the 
stiffness, scale Hke 0{l/pT). 



C. Optimization algorithm 

We have made the following three major observations: (a) 
the minimization problem is convex, (b) the action of the Hes- 
sian matrix is relatively efficient to compute, and (c) the min- 
imization problem can be very stiff. Given these ingredients, 
Newton's method is a natural and simple choice for accurately 
determining the minimum. 

The updates in Newton's method are straightforward to de- 
fine: 



Pi 



h {y^F^ |vf| , 



(27) 



where h will usually equal 1, except when the step-size 
is detected to be too great, in which case a simple back- 
tracking line-search is utilized ll23ll . As it is too expen- 
sive to construct and invert the full Hessian matrix, a (lin- 
ear) conjugate-gradient algorithm can be used to determine 
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(v^F^ |vf|. However, as the Hessian matrix is quite 

stiff, preconditioning can reduce the number of conjugate gra- 
dient steps dramatically. 

A preconditioner can be constructed by finding a matrix 
which is an approximation to the inverse of the Hessian. 
In general, the projection V, and each of the two terms in 
Eq. ( |22] |. do not commute. As a simple approximation, we 
invert the positive term in Eq. (|22] | (i.e. ) and apply the pro- 
jection as-if it did commute: 

T^r- {diag T,} • r (28) 

Here, Ti is simply the inverse of Mi. 

T, = ul®Uj' ■ diag U ■ Ur ® U^* 



1 



-2 ' 



(29) 



(30) 



{U)]k 



(A)j, - (A) 



kk 



kBT(ln{D,),, - HD,)kk) - /^((A)7/ - 

(31) 

Note that this preconditioner is also suitable for a pre- 
conditioned nonlinear conjugate gradient or L-BFGS search. 
We found that this preconditioner significantly accelerates 
the inner-CG solver in our Newton's method implementation. 
However, the preconditioned Hessian T • V^F remains quite 
stiff at low temperatures — numerical investigation leads us to 
believe that the approximate treatment of the projector V has 
a larger impact than neglecting the second term in Eq. ( |22] |. 

As the optimization progresses, we can estimate the er- 
ror in our solution using the Hessian matrix, or its approxi- 
mate inverse T. Thus we can use VF ■ (V'^F)^^{VF} for 
an determination of the proximity to the true minimum, or 
VF • T{VF} for a rough estimation. If sufficient CG steps 
are taken, we observe super-linear convergence. 

Because our stopping criteria depend on the derivatives, this 
minimization can proceed well beyond the numerical accu- 
racy of F itself, leading to a very accurate determination of 
the density matrices p;, as well as very rigorous bounds on 
F and Eq (extrapolation as ^ -> may be necessary before 
claiming lower bounds). 

Finally, we wish to point out one possible way to improve 
our optimization strategy. As an alternative approach to the 
interior-point potential to maintain positivity, one could use a 
positive representation, such as the square-root of the density 
matrices. 



Pi 



A] A,, 



(32) 



This decomposition is not unique, but we could constrain Ai 
to be Hermitian. Optimization could then proceed over the 
variables Ai. A similar analysis of the Hessian matrix to 
above leads us to believe that the optimization problem be- 
comes less stiff. On the other hand, the local consistency 
constraints are now quadratic polynomials of the search-space 
variables, and so a Langrange-multiplier approach may be 
necessary. Also note that small eigenvalues of pi could be 
removed entirely by using rectangular matrices Ai. 



IV. EXPECTATION VALUES AND SUSCEPTIBILITY 

Before proceeding to numerical simulations in Sec. |V] we 
discuss how to determine physical quantities of interest from 
the MED. Although we have explicit access to local informa- 
tion, stored in the reduced density matrices pi, more sophisti- 
cated techniques yield global properties such as the magnetic 
susceptibility or specific heat. 

The presence of long-range order is not simple to detect in 
the MED formalism. To begin with, we do not have an al- 
gorithm for determining long-distance correlations (except in 
ID, where belief propagation ll24ll may be appropriate). In a 
symmetry-broken phase, the free energy is minimized by the 
equal mixture of the relevant symmetry sectors. Further, this 
mixed state is the unique minimum of the MED optimization 
problem, because the cost functional F (Eq. ( fTOl )) is both con- 
vex and fully analytic (smooth). Therefore, we do not expect 
to observe either long-range order or explicit symmetry break- 
ing. In fact, because F is always smooth, we do not expect to 
observe phase transitions at all! 

This property is similar to exact-diagonalization of small 
systems; in-fact MED and exact diagonalization share many 
similarities, including scaling of numerical cost with n and 
strong finite-size effects. Nonetheless, we show here how to 
extract signatures of many -body cooperation and phase transi- 
tions in the thermodynamic limit, such as diverging magnetic 
susceptibility or specific heat. In Sec.[Vl we demonstrate that 
the many-body effects are much stronger in the MED than 
exact diagonahzation, even when using significantly smaller 
clusters. 

The specific heat dE/dT could be calculated by numeri- 
cal derivative of the energy at different temperatures; however 
it is also possible to determine directly from the equilibrium 
state at temperature T. The solution to the MED minimization 
always satisfies 

VF = = V^Hi+ keT Mogpi - ^ ® \ogp\ j (33) 



An infinitesimal change in temperature, T ^ T + dT, will re- 
sult in a small change in the solution pi ^ pi + dpi. Inserting 
this into the above and rearranging the differentials to find an 
equation for dpi / dT yields 



dpi 
dT' 



1 

= —V 
T 



(34) 



Although we have not determined an analytic solution, in 
Sec. unci we introduced an efficient preconditioned-CG al- 
gorithm for inverting the action of V^F. It becomes straight- 
forward to measure the thermal response at the end of each 
minimization. We can then simply calculate how any local 
quantity responds to the change in temperature, e.g. the spe- 
cific heat 



^^VTr 
dT ^ 



H. 



dpi 
dT 



(35) 
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This procedure can be generalized to the response with 
respect to perturbations to the Hamihonian. Take, for in- 
stance, the addition of a small, global magnetic field, such 
that Hi Hi + dBaf. Similar manipulations to the above 
yield 



dpi 
dB 



while the magnetic susceptibility is simply 

d{a^) 



dB 



dB 



(36) 



(37) 



We observe in the next section that the magnetic susceptibility 
calculated by MED can be many orders of magnitude greater 
than that of exact diagonalization in ferromagnetic symmetry- 
broken phases, while using less sites in the cluster 

This technique is quite powerful and flexible. Take, for 
example, a simple antiferromagnetic phase. The MED will 
return a mixture of symmetry sectors, yielding no breaking 
of translational invariance, and the minimization can be per- 
formed using just one cell. The staggered (or sublattice) mag- 
netization must be, by definition, zero in this translationally 
invariant ansatz. On the other hand, the staggered magnetic 
susceptibility is defined by the response to adding a perturba- 
tive, staggered magnetic field to the Hamihonian, which ex- 
plicitly breaks the original translational-invariance. For effi- 
ciency, we can transfer the zero-field solution to the checker- 
board lattice, and then add a perturbative, staggered magnetic 
field to the Hamihonian, repeating the above procedure to cal- 
culate the staggered magnetic susceptibility. Thus, the expen- 
sive step of minimization can always be performed using the 
full translational invariance of the Hamihonian, only breaking 
this for the relatively cheaper operation of determining the re- 
sponse to small fields. This can be extended to cells much 
larger than the MED clusters themselves, such as phases with 
chiral-order or similar 



V. XXZ MODEL 

In this section we benchmark the MED approach with the 
XXZ model on the 2D square lattice, and compare the results 
to those obtained with exact diagonalization of small systems 
with periodic boundary conditions. Unsurprisingly, we find 
that both the MED and exact-diagonalization approaches dis- 
play pronounced finite-size effects. We observe that the MED 
displays similar accuracy and sharpness of features using ap- 
proximately half the number of sites, compared to exact diag- 
onalization, while at the same time being much more flexible. 

The Hamihonian under consideration is 



(38) 



<ij> 



where <ij> denotes pairs of neighboring sites on a 2D square 
lattice. This model has a rich phase diagram containing sev- 
eral classical (i.e finite temperature) and quantum (i.e. zero 
temperature) phase transitions. 




FIG. 6: (Color online) Ground state energies of the XXZ model, as 
a function of a. The black lines corresponds to the MED results 
with clusters of different sizes. From top-to-bottom, we use an 8-site 
cluster (solid line), 7-site cluster (dotted line), 5-site cluster (dash-dot 
line) and 3-site cluster (dashed line). The solid line forms a rigorous 
lower-bound to the ground state energy. For comparison, the thick, 
red line and red points correspond to the exact product-state ground 
states, while the blue crosses represent Monte Carlo results for the 
Heisenberg Jll] and XY (2^ AFMs respectively. 



For the purpose of displaying the entire phase diagram, we 
rewrite the above in the form 



H = coa{na)a^a] + sin(7ra) [afa^ + afa]) , (39) 

<ij> 

A, 



where the anisotropy is given in terms of a by cot(7ra 
while the coupling strength J is fixed by sin(7rQ;) = J. 

Although no general analytical solution exists, certain val- 
ues of a correspond to well-understood systems. At a = we 
recover the classical antiferromagnetic (AFM) Ising model, 
while a = 1 corresponds to the ferromagnetic (FM) Ising 
model. The AFM and FM quantum XX model (A = 0) is 
given at a — 0.5 and 1.5, respectively. The AFM and FM 
Heisenberg model (A = 1) is given at points a = 0.25 and 
1.25, while a local unitary transformation maps a to 2 — a. 
The "Ising-like" FM phase with 0.75 < a < 1.25 have 
known, product-state ground states, with all spins pointing in 
the same direction. 

The Hamihonian has a U{1) symmetry corresponding to 
global rotations of the spin-vector about the z-axis, or equiv- 
alently the conservation of the z-projection of the total spin. 
This symmetry can be exploited numerically to decrease time 
and memory requirements, by storing the density-matrices pi 
in block-diagonal form. 

In Fig. |6] we show our numerical lower bounds of the 
ground state energy, and compare with some well-known 
points. For the regions with an exact product-state ground 
state, the MED is able to faithfully represent these and pre- 
dicts the correct results. In more entangled regions, the results 
simply lower bound the possible allowed energies — for ex- 
ample the Monte Carlo results included in Fig. |6] (accurate to 
many significant figures) lie above the MED prediction. The 
difference of a few percent is not surprising, given that a clus- 
ters of up to just 8 sites were used with no finite-size scaling 
analysis. 

The specific heat of a system can give an excellent indica- 
tion of the phase diagram. In Fig. [T] we plot the specific heat 



9 



at different values of a and temperature T, calculated by both 
MED and exact diagonalization with various cluster/system 
sizes. We observe that the MED produces the general features 
and sharpness of exact diagonalization simulations using far 
fewer sites (approximately half). The expected features of 
the phase diagram are already clearly visible for the exact- 
diagonalization using 16 (4x4) sites, and MED using just 7 
sites. 

In Fig. |7] we can clearly observe distinct boundaries be- 
tween the high-temperature paramagnetic phase, the ferro- 
magnetic (FM) and antiferromagnetic (AFM) discrete broken- 
symmetry phases, as well as the Berezinskii-Kosterlitz- 
Thouless (BKT) regions corresponding to systems with quasi- 
long range order (analogous to the classical rotor model in 
2D). At the phase boundary between the high-temperature and 
(anti)ferromagnetic phases, we expect to observer a narrow, 
divergent peak in the specific heat; however this is smoothed 
somewhat by finite-size (or finite-cluster) effects. 

The maximum of the specific heat gives a first approxima- 
tion to the location of the phase transition, and we have plotted 
this critical temperature in Fig.|8] The position of the peak is 
strongly affected by the number of sites in both the very small 
exact diagonalization systems and small MED clusters. This 
effect can be most easily quantified at the points a = or 1, 
where the known value of = 2/ ln(l + \/2) w 2.269 |22l is 
6.7% smaller than the 4x4 ED value of 2.421, and 1.8% larger 
than the 7-site MED value of 2.228. This analysis fails closer 
the the SU(2)-invariant quantum critical points (|A| = 1, 
or a = 0.25, 0.75, 1.25, 1.75) where the narrow peak is ob- 
scured by finite-size effects and the broad, smooth maximum 
expected at higher temperatures. The continuous, BKT transi- 
tion is generally more challenging to locate, especially when 
dealing with extremely small systems or, in our case, small 
MED clusters. 

There exist a myriad of more sophisticated, finite-size scal- 
ing techniques for determining the critical temperature (e.g. 
those employed in ll25ll26ll ). One popular finite-size technique 
is to investigate the crossing of the specific heat at different 
system sizes; in general these crossings will approximately 
coincide and converge rapidly to critical point with increas- 
ing system size L. Unfortunately, both exact-diagonalization 
and MED give poor results for these system sizes — in Fig.|2] 
we observe the smaller simulations don't even clearly recover 
four distinct low-temperature phases, and we do not have 
enough data to perform the crossover analysis. If one were 
to apply this approach using larger clusters, it is apparent that 
care must be taken with cluster boundaries (i.e. to compare 
clusters with similar shaped neighbourhoods A/fc) Just as sim- 
ilar boundary conditions should be used in scaling analysis 
of exact-diagionalization or Monte Carlo results. One further 
complication results from the MED method itself — the fact 
that F increases monotonically with larger cluster sizes adds 
constraints to its derivatives (i.e. the energy and specific heat) 
that potentially complicate the crossover analysis. A fuller 
understanding of scaling in the MED is the subject of future 
work. 

Finally, we can extract information on the behaviour of each 
distinct phase by investigating susceptibility. Although nei- 



ther MED nor exact-diagonalization of finite-systems displays 
explicit symmetry breaking, the onset of long-range order in 
the thermodynamic limit is predicated by rapidly growing sus- 
ceptibility with system size, caused by strongly-cooperative 
many-body behaviour. In Fig.|9]we show the magnetic suscep- 
tibility calculated by exact-diagonalization and the MED. In 
both cases, we can see that the magnetic susceptibility grows 
extremely rapidly in the ferromagnetic region, while being 
suppressed in the antiferromagnetic phase. However, the 
many -body behaviour is much stronger in the MED, where the 
susceptibility is orders of magnitude greater. The same analy- 
sis can be applied, for instance, to the staggered magnetization 
to characterize the anti-ferromagnetic region (Fig.|9](c)), giv- 
ing results as expected. 



VI. CONCLUSION 

The Markov entropy decomposition is flexible cluster- 
based method for finite temperature quantum calculations in 
any dimension or geometry, giving rigorous lower bounds to 
free- and ground-state energies. In this paper we have de- 
tailed efficient algorithms for optimizing the MED clusters 
and demonstrated its applicability to the XXZ model on the 
square lattice. We observe that MED simulations share many 
properties with exact diagonalization, both having exponen- 
tial scaling in cost with cluster size, strong finite-size effects, 
and the lack of explicit phase transitions or symmetry break- 
ing. On the other hand, the MED approach is much more 
accurate than exact-diagonalization for a given number of lat- 
tice sites, displays stronger many-body effects, and further is 
much more flexible in the types of geometries that are feasible 
to study. For example, one could study large, finite systems 
(e.g. harmonically-trapped atoms in optical lattices), systems 
with large unit cells (e.g. the hyperkagome lattice), or inves- 
tigate symmetry breaking over large unit cells or that are in- 
commensurate with the lattice (e.g. systems with chiral order). 
Further, the close relationship with belief propagation ll24ll 
leads us to believe that this method would be excellent for 
the study of disordered, glassy systems at finite-temperature, 
where correlation lengths are typically short, but the relevant 
global states are difficult to determine. The authors hope that 
this paper stimulates further work with this promising method. 
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Appendix 

In this Appendix, we derive the form the Hessian (Eqs. (|20] - 
|26] |) using matrix calculus. We begin with the unconstrained 
problem, ignoring the projector V. The first-derivative in the 
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FIG. 7: (Color online) Specific heat as a function of temperature and the parameter a. (a-d) Exact diagonalization results with (a) 2x 2 lattice, 
(b) 4x2 lattice, (c) 4x3 lattice, and (d) 4x4 lattice, (e-h) MED results using a cluster containing (e) 3-sites, (f) 5-sites, (g) 7-sites, and (h) 8- 
sites. We see from (c) that exact-diagonalization is very sensitive to system size and boundary conditions, while a variety of cluster shapes can 
be used with the MED. 



unconstrained subspace is given by Eq. ( fTSl l, restated here 

( 



dF 



H 



1 



ksT yog Pi - ^ ® logp- 



(40) 



Before continuing, lets examine matrix functions. A scalar 
function f{x) can be converted to the matrix function f{x) 
by performing it's action on the eigenvalues of x. That is, 
if (Hermitian) x has eigenvalue decomposition UDW, then 
f{x) = UEU'^ where /(•) is applied element-wise to D: 



E = 



/(A,i) 






/(A,2) 



(41) 



This is in-fact the meaning behind the terms in Eq. ( |40] |. 

Next we wish to examine how the matrix f{x) changes if 
an element of x changes. To simplify this matrix derivative, 
we will proceed to work in the basis that x is diagonal. First 
we examine what happens when a small change occurs to a 
diagonal element. 

uL)ii h^O 

- SkAkfiDu) (42) 

Note /' is the derivative of /. Unsurprisingly, we get 
dEu/dDu = nD,i). 

The off-diagonal terms are slightly more complicated. 
Adding a small perturbation to an off-diagonal element Dij 
leaves the matrix diagonal everywhere except the 2x2 sub- 
matrix involving rows and columns i and j. The problem can 



be solved by dealing just with this sub-matrix. 



dE^ 



dD. 



lim ■ 



Du h 
D,. 



-f 



D,, 
D,, 



Dii-Djj 





The derivative is zero otherwise. In the limit Z?, 



(43) 



Oi' the 

term approaches /'{Da). A simple way to see this result is to 
prove it for all functions f{x) ~ i". 



dE. 



dD, 



lim 






"A. 


h 


n 


■ A, 





n 












A. . 









(44) 



Combining this with a simple Taylor expansion of f{x) gen- 
eralizes this to Eq. ( l43T l. (The result holds for all differentiable 
/, regardless of the radius of convergence of the Taylor expan- 
sion). 

The tensor dE/dD is actually a diagonal, linear super- 
operator whose action is element-wise multiplication by the 
matrix M, where 



M = 



/'(A.i) 

f{Dl,l)-f{D2,2) 



f{Di 



'f{D2 



-D2 



Dl,l-D2.2 

/'(A,2) 



(45) 
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FIG. 8: (Color online) Critical temperature as a function of the pa- 
rameter a, extracted from the peak of the specific heat, (a) Re- 
sults from exact diagonalization, using systems sizes of 4x4 (black, 
solid line), 4x2 (blue points) and 2x2 (magenta crosses), (b) Re- 
sults from MED for clusters containing 8-sites (black, solid line), 
7-sites (blue points), 5-sites (magenta crosses) and 3-sites (green 
squares). Red points correspond to the analytic Ising model solution, 
^c = 2/ln(l + ^/2). 




FIG. 9: (Color online) Magnetic susceptibility as a function of tem- 
perature and the anisotropy parameter a. (a) Susceptibility in the 
4x4 lattice, (b) Susceptibility given by MED with an 8-site cluster, 
(c) Susceptibility of the staggered magnetization given by MED with 
an 8-site cluster. Note the logarithmic color-scale. 



To apply this operator, we must first transform to the basis 
where x is diagonal, by multiplying on the left by U and on the 
right by W . The super-operator that performs this operation 
is U ®U* . We then multiply element- wise by M, or rather 
apply the diagonal super-operator diag M, before performing 
the inverse unitary transform tj^®tj^ . Together, this becomes 



dm 

dx 



C/^ [/'^ • diag M-0®U* 



(46) 



This now explains the first line of Eq. (l22l i. Equations (I22l23l l 
come from inserting f{x) = ksT log x + into Eq. ( |45] |. 

Finally, a simple extension is required to deal with the re- 
duced density matrices p'^. First we split the Hilbert space 
into two components, that relating to site k, and the remainder 
of the cluster TW'j,. We define the (linear) super-operator that 
performs the partial trace over site k as pt, so that pipk = p'k- 
We define the quasi-inverse of this operator, the 'expansion' 
super-operator, that adds the fully mixed state on site fc, as ey, 
where ey pj, = pj, (g) 1 /d. The p'j, term in Eq. ( |40] | becomes 



fcBTey{log(pi{/5fc})}. 



(47) 



To differentiate this composition, we simply apply the stan- 
dard chain rule. 



dti{z} d\og{y) dpi{x} 



dy 



dx 



(48) 



where y = pti and z = logy. The linear super-operators ey 
and pt are equal to the derivative of their action (this is true 
for any linear super-operator), so this simplifies to 



dy 



(49) 



The central derivative can be defined in the basis where p'^. is 
diagonal, in much the same way as above. This leads directly 
to the second term in Eq. ( l22l i. 
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